Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

نویسندگان

چکیده

We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where goal is to estimate performance an policy, pie, using a fixed dataset, D, collected by one or more policies that may be different from pie. Current OPE algorithms produce poor estimates under policy distribution shift i.e., when probability particular state-action pair occurring pie very same D. In this work, we propose improve accuracy estimators projecting high-dimensional state-space into low-dimensional concepts state abstraction literature. Specifically, marginalized importance sampling (MIS) which compute correction ratios their estimate. original ground state-space, these have high variance lead OPE. However, prove lower-dimensional abstract can lower resulting then highlight challenges arise estimating data, identify sufficient conditions overcome issues, and present minimax optimization whose solution yields ratios. Finally, our empirical on difficult, tasks shows make MIS achieve mean-squared error robust hyperparameter tuning than

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Importance Sampling for MarkovChains on General State Spaces 1

Adaptive importance sampling involves successively estimating the function of interest and then constructing an importance sampling scheme built on the estimate. Here, we investigate such a scheme used in simulations of Markov chains derived from particle transport problems. Previous work had shown that for nite state spaces the convergence was exponential, which veri ed computational experienc...

متن کامل

State-dependent importance sampling schemes via minimum cross-entropy

We present a method to obtain stateand time-dependent importance sampling estimators by repeatedly solving a minimum cross-entropy (MCE) program as the simulation progresses. This MCE-based approach lends a foundation to the natural notion to stop changing the measure when it is no longer needed. We use this method to obtain a stateand time-dependent estimator for the one-tailed probability of ...

متن کامل

Efficient High-Dimensional Importance Sampling

The paper describes a simple, generic and yet highly accurate Efficient Importance Sampling (EIS) Monte Carlo (MC) procedure for the evaluation of high-dimensional numerical integrals. EIS is based upon a sequence of auxiliary weighted regressions which actually are linear under appropriate conditions. It can be used to evaluate likelihood functions and byproducts thereof, such as ML estimators...

متن کامل

Near Optimal Behavior via Approximate State Abstraction

The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present ...

متن کامل

Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces

While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot of training data. One way to increase the speed at which agents are able to learn to perform tasks is by leveraging the input of human trainers. Although such input can take many forms, real-time, scalar-valued feedbac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i8.26128